深度加强学习(RL)的增长为该领域带来了多种令人兴奋的工具和方法。这种快速扩展使得了解RL工具箱的各个元素之间的相互作用。通过在连续控制环境中进行研究,我们从实证角度接近这项任务。我们提出了对基本性质的多个见解,包括:从相同数据培训的多个演员的平均值提升了性能;现有方法在培训运行,培训时期,培训时期和评估运行不稳定;有效培训不需要常用的添加剂动作噪声;基于后抽样的策略探讨比近似的UCB与加权Bellman备份相结合的探讨;单独加权的Bellman备份不能取代剪辑的双Q学习;批评者的初始化在基于集合的演员批评探索中起着重要作用。作为一个结论,我们展示了现有的工具如何以新颖的方式汇集,产生集合深度确定性政策梯度(ED2)方法,从Openai Gyem Mujoco的连续控制任务产生最先进的结果。从实际方面,ED2在概念上简单,易于编码,并且不需要在现有RL工具箱之外的知识。
translated by 谷歌翻译
Figure 1: A five-fingered humanoid hand trained with reinforcement learning manipulating a block from an initial configuration to a goal configuration using vision for sensing.
translated by 谷歌翻译
Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy.
translated by 谷歌翻译
Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). We present a novel technique called Hindsight Experience Replay which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering. It can be combined with an arbitrary off-policy RL algorithm and may be seen as a form of implicit curriculum. We demonstrate our approach on the task of manipulating objects with a robotic arm. In particular, we run experiments on three different tasks: pushing, sliding, and pick-and-place, in each case using only binary rewards indicating whether or not the task is completed. Our ablation studies show that Hindsight Experience Replay is a crucial ingredient which makes training possible in these challenging environments. We show that our policies trained on a physics simulation can be deployed on a physical robot and successfully complete the task. The video presenting our experiments is available at https://goo.gl/SMrQnI.
translated by 谷歌翻译
Audio DeepFakes are artificially generated utterances created using deep learning methods with the main aim to fool the listeners, most of such audio is highly convincing. Their quality is sufficient to pose a serious threat in terms of security and privacy, such as the reliability of news or defamation. To prevent the threats, multiple neural networks-based methods to detect generated speech have been proposed. In this work, we cover the topic of adversarial attacks, which decrease the performance of detectors by adding superficial (difficult to spot by a human) changes to input data. Our contribution contains evaluating the robustness of 3 detection architectures against adversarial attacks in two scenarios (white-box and using transferability mechanism) and enhancing it later by the use of adversarial training performed by our novel adaptive training method.
translated by 谷歌翻译
This paper proposes the use of an event camera as a component of a vision system that enables counting of fast-moving objects - in this case, falling corn grains. These type of cameras transmit information about the change in brightness of individual pixels and are characterised by low latency, no motion blur, correct operation in different lighting conditions, as well as very low power consumption. The proposed counting algorithm processes events in real time. The operation of the solution was demonstrated on a stand consisting of a chute with a vibrating feeder, which allowed the number of grains falling to be adjusted. The objective of the control system with a PID controller was to maintain a constant average number of falling objects. The proposed solution was subjected to a series of tests to determine the correctness of the developed method operation. On their basis, the validity of using an event camera to count small, fast-moving objects and the associated wide range of potential industrial applications can be confirmed.
translated by 谷歌翻译
Realistic synthetic image data rendered from 3D models can be used to augment image sets and train image classification semantic segmentation models. In this work, we explore how high quality physically-based rendering and domain randomization can efficiently create a large synthetic dataset based on production 3D CAD models of a real vehicle. We use this dataset to quantify the effectiveness of synthetic augmentation using U-net and Double-U-net models. We found that, for this domain, synthetic images were an effective technique for augmenting limited sets of real training data. We observed that models trained on purely synthetic images had a very low mean prediction IoU on real validation images. We also observed that adding even very small amounts of real images to a synthetic dataset greatly improved accuracy, and that models trained on datasets augmented with synthetic images were more accurate than those trained on real images alone. Finally, we found that in use cases that benefit from incremental training or model specialization, pretraining a base model on synthetic images provided a sizeable reduction in the training cost of transfer learning, allowing up to 90\% of the model training to be front-loaded.
translated by 谷歌翻译
We leverage probabilistic models of neural representations to investigate how residual networks fit classes. To this end, we estimate class-conditional density models for representations learned by deep ResNets. We then use these models to characterize distributions of representations across learned classes. Surprisingly, we find that classes in the investigated models are not fitted in an uniform way. On the contrary: we uncover two groups of classes that are fitted with markedly different distributions of representations. These distinct modes of class-fitting are evident only in the deeper layers of the investigated models, indicating that they are not related to low-level image features. We show that the uncovered structure in neural representations correlate with memorization of training examples and adversarial robustness. Finally, we compare class-conditional distributions of neural representations between memorized and typical examples. This allows us to uncover where in the network structure class labels arise for memorized and standard inputs.
translated by 谷歌翻译
Background and Purpose: Colorectal cancer is a common fatal malignancy, the fourth most common cancer in men, and the third most common cancer in women worldwide. Timely detection of cancer in its early stages is essential for treating the disease. Currently, there is a lack of datasets for histopathological image segmentation of rectal cancer, which often hampers the assessment accuracy when computer technology is used to aid in diagnosis. Methods: This present study provided a new publicly available Enteroscope Biopsy Histopathological Hematoxylin and Eosin Image Dataset for Image Segmentation Tasks (EBHI-Seg). To demonstrate the validity and extensiveness of EBHI-Seg, the experimental results for EBHI-Seg are evaluated using classical machine learning methods and deep learning methods. Results: The experimental results showed that deep learning methods had a better image segmentation performance when utilizing EBHI-Seg. The maximum accuracy of the Dice evaluation metric for the classical machine learning method is 0.948, while the Dice evaluation metric for the deep learning method is 0.965. Conclusion: This publicly available dataset contained 5,170 images of six types of tumor differentiation stages and the corresponding ground truth images. The dataset can provide researchers with new segmentation algorithms for medical diagnosis of colorectal cancer, which can be used in the clinical setting to help doctors and patients.
translated by 谷歌翻译
当植物天然产物与药物共容纳时,就会发生药代动力学天然产物 - 药物相互作用(NPDIS)。了解NPDI的机制是防止不良事件的关键。我们构建了一个知识图框架NP-KG,作为迈向药代动力学NPDIS的计算发现的一步。 NP-KG是一个具有生物医学本体论,链接数据和科学文献的全文,由表型知识翻译框架和语义关系提取系统,SEMREP和集成网络和动态推理组成的构建的科学文献的全文。通过路径搜索和元路径发现对药代动力学绿茶和kratom-prug相互作用的案例研究评估NP-KG,以确定与地面真实数据相比的一致性和矛盾信息。完全集成的NP-KG由745,512个节点和7,249,576个边缘组成。 NP-KG的评估导致了一致(绿茶的38.98%,kratom的50%),矛盾(绿茶的15.25%,21.43%,Kratom的21.43%),同等和矛盾的(15.25%)(21.43%,21.43%,21.43% kratom)信息。几种声称的NPDI的潜在药代动力学机制,包括绿茶 - 茶氧化烯,绿茶 - 纳多洛尔,Kratom-Midazolam,Kratom-Quetiapine和Kratom-Venlafaxine相互作用,与已出版的文献一致。 NP-KG是第一个将生物医学本体论与专注于天然产品的科学文献的全文相结合的公斤。我们证明了NP-KG在鉴定涉及酶,转运蛋白和药物的药代动力学相互作用的应用。我们设想NP-KG将有助于改善人机合作,以指导研究人员将来对药代动力学NPDIS进行研究。 NP-KG框架可在https://doi.org/10.5281/zenodo.6814507和https://github.com/sanyabt/np-kg上公开获得。
translated by 谷歌翻译